On the Error Probability of Model Selection for Classi cation

نویسنده

  • Joe Suzuki
چکیده

We consider model selection based on information criteria for classi cation. In classi cation, a class y is guessed from an attribute x based on the true conditional probability P (yjx), where x 2 X, y 2 Y , and X and Y are in nite and nite sets. In model selection, given examples, we select the model that minimizes an information criterion. The information criteria we address in this paper are expressed in the form of the empirical entropy plus a compensation term (k(g)=2)d(n), where k(g) is the number of independent parameters in a model g, d(n) is a function of n by which the information criterion is characterized, and n is the number of examples. We derive for arbitrary d( ) the asymptotically exact error probability in model selection. Although it was known for autoregressive processes that 2 d(n) = log log n is the minimum function of n such that the model selection satis es strong consistency, the problem whether the same thing holds for classi cation has been open. We solve this problem in the a rmative. Additionally, we derive for the d( ) that satisfy weak consistency the expected Kullback-leibler divergence between a true conditional probability P (yjx) and the conditional probability ^ P (yjx) estimated by the model selection and a parameter estimator. The derived value is k(g )=(2n), where g is a true model, and the accumulated value over n time instances is computed as (k(g )=2) log n+O(1), which implies the optimality of a predictive coding based on the model selection.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Minimum error rate training for designing tree-structured probability density function

In this paper, we propose a signal prototype classi cation and evaluation framework in acoustic modeling. Based on this framework, a new tree-structured likelihood function is derived. It uses a designated cluster kernel f m for signal prototype classi cation and a designated cluster kernel f m for likelihood evaluation of outlier or tail events of the cluster. A minimum classi cation error (MC...

متن کامل

Cloud Classi cation Using Error-Correcting Output Codes

Novel arti cial intelligence methods are used to classify 16x16 pixel regions (obtained from Advanced Very High Resolution Radiometer (AVHRR) images) in terms of cloud type (e.g., stratus, cumulus, etc.). We previously reported that intelligent feature selection methods, combined with nearest neighbor classi ers, can dramatically improve classi cation accuracy on this task. Our subsequent analy...

متن کامل

The Error-reject Tradeoff

We investigate the error versus reject tradeo for classi ers. Our analysis is motivated by the remarkable similarity in error-reject tradeo curves for widely di ering algorithms classifying handwritten characters. We present the data in a new scaled version that makes this universal character particularly evident. Based on Chow's theory of the error-reject tradeo and its underlying Bayesian ana...

متن کامل

Image Classification Based on a Multiresolution Two Dimensional Hidden Markov Model

This paper presents an image classi cation algorithm using a multiresolution two dimensional hidden Markov model (HMM). The multiresolution two dimensional hidden Markov model is an extension from the two dimensional hidden Markov model for image classi cation. A classi er estimates model parameters using the EM algorithm. Classi cation is then performed according to the maximum a posteriori pr...

متن کامل

Minimax Nonparametric Classi cation|Part I: Rates of Convergence

|This paper studies minimax aspects of nonparametric classi cation. We rst study minimax estimation of the conditional probability of a class label, given the feature variable. This function, say f; is assumed to be in a general nonparametric class. We show the minimax rate of convergence under square L2 loss is determined by the massiveness of the class as measured by metric entropy. The secon...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007